Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make active_work match records_per_batch #1316

Merged
merged 7 commits into from
Sep 27, 2024

Conversation

andyleiserson
Copy link
Collaborator

@andyleiserson andyleiserson commented Sep 26, 2024

Adjust active_work to match records_per_batch. If it is less, we will stall, since every record in the batch remains incomplete until the batch is validated. More might be okay, but making it the same seems simpler for now.

This change also modifies an existing DZKP proptest to be more extensive. I believe the test is capable of exposing both the hang when active_work is less than batch size (when run without the fix in this PR), and the hang due to read_size not dividing active work (not yet fixed).

I moved a couple tests that are doing DZKPs with various BA types from secret_sharing::vector::impls to the DZKP module. My intent was to adapt them to cover the interaction with record size, but I haven't done that yet. (The move is in a separate commit, although in the current state I don't think there's interesting diff on top of the move anyways.)

  • Record size tests
  • Tune proptest runtime for CI

Copy link
Collaborator

@akoshelev akoshelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but it seems some tests are broken.

}

impl<'a> DZKPUpgraded<'a> {
pub(super) fn new(
validator_inner: &Arc<MaliciousDZKPValidatorInner<'a>>,
base_ctx: MaliciousContext<'a>,
) -> Self {
// Adjust active_work to be at least records_per_batch. If it is less, we will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably log the fact that we've adjusted it - maybe at the debug level


#[tokio::test]
async fn select_semi_honest() {
test_select_semi_honest::<BA8>().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth testing it for for weird types like BA3 and BA7 as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I had BA4 but took it out because it's not boolean_vector! (not for any good reason other than we haven't needed it). But I think it is worth having a less-than-one-byte case, and maybe even adding a new BA type so we can cover the between-one-and-two-bytes case.

(1usize<<log_count, 1usize<<log_multiplication_amount)
}
fn record_count_strategy() -> impl Strategy<Value = usize> {
prop_oneof![1usize..=512, (1usize..=9).prop_map(|i| 1usize << i)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am being stupid and it is 12pm now - is it really testing $$2^{512}$$ records?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What it does is:

  • 50% of the time it picks a random power of two between 2 and 512 (9 possible values)
  • 50% of the time it picks a random integer between 1 and 512 (512 possible values)

Two reasons to focus on powers of two, although I didn't put a huge amount of thought into this (e.g. it just occurred to me to add $2^0$):

  • Because we plan to restrict batch sizes to powers of two.
  • As the count increases, it makes sense to me to sample the space more sparsely, there's relatively less interesting about testing batch sizes 399, 400, and 401 than about testing batch sizes 7, 8, and 9.

I think it is clearer to have the two prop_oneof cases on different lines, but rustfmt insisted on a single line.

// Adjust active_work to be at least records_per_batch. The MAC validator
// currently configures the batcher with records_per_batch = active_work, which
// makes this adjustment a no-op, but we do it this way to match the DZKP validator.
let records_per_batch = batch.lock().unwrap().records_per_batch();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to assert this to make sure we don't miss the misalignment between MAC and ZKP?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm fine with that.

@@ -217,7 +217,7 @@ impl<'a, F: ExtendableField> BatchValidator<'a, F> {

// TODO: Right now we set the batch work to be equal to active_work,
// but it does not need to be. We can make this configurable if needed.
let records_per_batch = ctx.active_work().get().min(total_records.get());
let records_per_batch = ctx.active_work().get();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing if larger than total_records may have been necessary with an earlier version of the batcher, but the current version should take care of that internally, so I removed it here. No relation to the rest of these changes though.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 98.48485% with 3 lines in your changes missing coverage. Please review.

Project coverage is 93.52%. Comparing base (1521f8b) to head (d2512f1).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
ipa-core/src/protocol/context/dzkp_malicious.rs 80.00% 2 Missing ⚠️
ipa-core/src/protocol/context/malicious.rs 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1316      +/-   ##
==========================================
+ Coverage   93.45%   93.52%   +0.07%     
==========================================
  Files         207      208       +1     
  Lines       33451    33802     +351     
==========================================
+ Hits        31260    31613     +353     
+ Misses       2191     2189       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for<'a> Replicated<V>: BooleanArrayMul<DZKPUpgradedSemiHonestContext<'a, NotSharded>>,
Standard: Distribution<V>,
{
let world = TestWorld::default();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that we probably want to use the same seed for rng and test world (it is supported via TestWorldConfig struct). That way we can make it reproducible if it ever fails

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an issue for this. It's definitely worth doing in general, but it doesn't seem all that important for this particular test case, where the input values should be unrelated to the behavior of the test.

@private-attribution private-attribution deleted a comment from TeONy46 Sep 27, 2024
@andyleiserson
Copy link
Collaborator Author

The newest commit is to fix the failure in https://github.com/andyleiserson/ipa/actions/runs/11061292252/job/30733595263.

@andyleiserson andyleiserson changed the title Make active_work at least records_per_batch Make active_work match records_per_batch Sep 27, 2024
@andyleiserson
Copy link
Collaborator Author

Draft runs:

100k: https://draft-mpc.vercel.app/query/view/agile-book2024-09-27T1606, successful
2M: https://draft-mpc.vercel.app/query/view/aided-icon2024-09-27T1612, stalled

I suspect the issue is that, without the changes to make batch sizes power of two / align read size with batch sizes, we can't remove the change that uses active work as the batch size for attribution.

@andyleiserson
Copy link
Collaborator Author

andyleiserson commented Sep 27, 2024

Draft result for 2M records at d2512f1: https://draft-mpc.vercel.app/query/view/drunk-widow2024-09-27T1647

It failed due to #1308. But when I added the workaround (that was 2333492), CI failed due to excessive memory allocation. The 2M draft run at 2333492 passed: https://draft-mpc.vercel.app/query/view/iron-input2024-09-27T1735.

@andyleiserson andyleiserson merged commit 59ca3e7 into private-attribution:main Sep 27, 2024
22 checks passed
@andyleiserson andyleiserson deleted the active-work branch September 27, 2024 18:33
@akoshelev
Copy link
Collaborator

CI failed due to excessive memory allocation.
Can you please post a link? It's first time we are failing on OOM and I want to quickly check whether we're missing something

@andyleiserson
Copy link
Collaborator Author

Here's the failure with large batch size for breakdown reveal aggregation: https://github.com/private-attribution/ipa/actions/runs/11075152193/job/30775478393

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants